Goto

Collaborating Authors

 Alexandria Governorate


Data-Driven Dynamic Parameter Learning of manipulator robots

Elseiagy, Mohammed, Alemayoh, Tsige Tadesse, Bezerra, Ranulfo, Kojima, Shotaro, Ohno, Kazunori

arXiv.org Artificial Intelligence

Bridging the sim-to-real gap remains a fundamental challenge in robotics, as accurate dynamic parameter estimation is essential for reliable model-based control, realistic simulation, and safe deployment of manipulators. Traditional analytical approaches often fall short when faced with complex robot structures and interactions. Data-driven methods offer a promising alternative, yet conventional neural networks such as recurrent models struggle to capture long-range dependencies critical for accurate estimation. In this study, we propose a Transformer-based approach for dynamic parameter estimation, supported by an automated pipeline that generates diverse robot models and enriched trajectory data using Jacobian-derived features. The dataset consists of 8,192 robots with varied inertial and frictional properties. Leveraging attention mechanisms, our model effectively captures both temporal and spatial dependencies. Experimental results highlight the influence of sequence length, sampling rate, and architecture, with the best configuration (sequence length 64, 64 Hz, four layers, 32 heads) achieving a validation R2 of 0.8633. Mass and inertia are estimated with near-perfect accuracy, Coulomb friction with moderate-to-high accuracy, while viscous friction and distal link center-of-mass remain more challenging. These results demonstrate that combining Transformers with automated dataset generation and kinematic enrichment enables scalable, accurate dynamic parameter estimation, contributing to improved sim-to-real transfer in robotic systems


Manifold-Aware Diffusion-Augmented Contrastive Learning for Noise-Robust Biosignal Representation

Zewail, Rami

arXiv.org Artificial Intelligence

Learning robust representations for physiological time-series signals continues to pose a substantial challenge in developing efficient few-shot learning applications. This difficulty is largely due to the complex pathological variations in biosignals. In this context, this paper introduces a manifold-aware Diffusion-Augmented Contrastive Learning (DACL) framework, which efficiently leverages the generative structure of latent diffusion models with the discriminative power of supervised contrastive learning. The proposed framework operates within a contextualized scattering latent space derived from Scattering Transformer (ST) features. Within a contrastive learning framework, we employ a forward diffusion process in the scattering latent space as a structured manifold-aware feature augmentation technique. We assessed the proposed framework using the PhysioNet 2017 ECG benchmark dataset. The proposed method achieved a competitive AUROC of 0.9741 in the task of detecting atrial fibrillation from a single-lead ECG signal. The proposed framework achieved performance on par with relevant state-of-the-art related works. In-depth evaluation findings suggest that early-stage diffusion serves as an ideal "local manifold explorer," producing embeddings with greater precision than typical augmentation methods while preserving inference efficiency.


Towards a Safer and Sustainable Manufacturing Process: Material classification in Laser Cutting Using Deep Learning

Salem, Mohamed Abdallah, Ashur, Hamdy Ahmed, Elshinnawy, Ahmed

arXiv.org Artificial Intelligence

Laser cutting is a widely adopted technology in material processing across various industries, but it generates a significant amount of dust, smoke, and aerosols during operation, posing a risk to both the environment and workers' health. Speckle sensing has emerged as a promising method to monitor the cutting process and identify material types in real-time. This paper proposes a material classification technique using a speckle pattern of the material's surface based on deep learning to monitor and control the laser cutting process. The proposed method involves training a convolutional neural network (CNN) on a dataset of laser speckle patterns to recognize distinct material types for safe and efficient cutting. Previous methods for material classification using speckle sensing may face issues when the color of the laser used to produce the speckle pattern is changed. Experiments conducted in this study demonstrate that the proposed method achieves high accuracy in material classification, even when the laser color is changed. The model achieved an accuracy of 98.30 % on the training set and 96.88% on the validation set. Furthermore, the model was evaluated on a set of 3000 new images for 30 different materials, achieving an F1-score of 0.9643. The proposed method provides a robust and accurate solution for material-aware laser cutting using speckle sensing.


Accelerating scientific discovery with the common task framework

Kutz, J. Nathan, Battaglia, Peter, Brenner, Michael, Carlberg, Kevin, Hagberg, Aric, Ho, Shirley, Hoyer, Stephan, Lange, Henning, Lipson, Hod, Mahoney, Michael W., Noe, Frank, Welling, Max, Zanna, Laure, Zhu, Francis, Brunton, Steven L.

arXiv.org Artificial Intelligence

Machine learning (ML) and artificial intelligence (AI) algorithms are transforming and empowering the characterization and control of dynamic systems in the engineering, physical, and biological sciences. These emerging modeling paradigms require comparative metrics to evaluate a diverse set of scientific objectives, including forecasting, state reconstruction, generalization, and control, while also considering limited data scenarios and noisy measurements. We introduce a common task framework (CTF) for science and engineering, which features a growing collection of challenge data sets with a diverse set of practical and common objectives. The CTF is a critically enabling technology that has contributed to the rapid advance of ML/AI algorithms in traditional applications such as speech recognition, language processing, and computer vision. There is a critical need for the objective metrics of a CTF to compare the diverse algorithms being rapidly developed and deployed in practice today across science and engineering.


Distilling LLM Agent into Small Models with Retrieval and Code Tools

Kang, Minki, Jeong, Jongwon, Lee, Seanie, Cho, Jaewoong, Hwang, Sung Ju

arXiv.org Artificial Intelligence

Large language models (LLMs) excel at complex reasoning tasks but remain computationally expensive, limiting their practical deployment. To address this, recent works have focused on distilling reasoning capabilities into smaller language models (sLMs) using chain-of-thought (CoT) traces from teacher LLMs. However, this approach struggles in scenarios requiring rare factual knowledge or precise computation, where sLMs often hallucinate due to limited capability. In this work, we propose Agent Distillation, a framework for transferring not only reasoning capability but full task-solving behavior from LLM-based agents into sLMs with retrieval and code tools. We improve agent distillation along two complementary axes: (1) we introduce a prompting method called first-thought prefix to enhance the quality of teacher-generated trajectories; and (2) we propose a self-consistent action generation for improving test-time robustness of small agents. We evaluate our method on eight reasoning tasks across factual and mathematical domains, covering both in-domain and out-of-domain generalization. Our results show that sLMs as small as 0.5B, 1.5B, 3B parameters can achieve performance competitive with next-tier larger 1.5B, 3B, 7B models fine-tuned using CoT distillation, demonstrating the potential of agent distillation for building practical, tool-using small agents. Our code is available at https://github.com/Nardien/agent-distillation.


Scattering Transformer: A Training-Free Transformer Architecture for Heart Murmur Detection

Zewail, Rami

arXiv.org Artificial Intelligence

In an attempt to address the need for skilled clinicians in heart sound interpretation, recent research efforts on automating cardiac auscultation have explored deep learning approaches. The majority of these approaches have been based on supervised learning that is always challenged in occasions where training data is limited. More recently, there has been a growing interest in potentials of pre-trained self-supervised audio foundation models for biomedical end tasks. Despite exhibiting promising results, these foundational models are typically computationally intensive. Within the context of automatic cardiac auscultation, this study explores a lightweight alternative to these general-purpose audio foundation models by introducing the Scattering Transformer, a novel, training-free transformer architecture for heart murmur detection. The proposed method leverages standard wavelet scattering networks by introducing contextual dependencies in a transformer-like architecture without any backpropagation. We evaluate our approach on the public CirCor DigiScope dataset, directly comparing it against leading general-purpose foundational models. The Scattering Transformer achieves a Weighted Accuracy(WAR) of 0.786 and an Unweighted Average Recall(UAR) of 0.697, demonstrating performance highly competitive with contemporary state of the art methods. This study establishes the Scattering Transformer as a viable and promising alternative in resource-constrained setups.


The search for Cleopatra's long-lost tomb leads to sunken seaport

Popular Science

Science Archaeology The search for Cleopatra's long-lost tomb leads to sunken seaport A new documentary explores this 2,000-year-old mystery and a connection to the RMS'Titanic.' Breakthroughs, discoveries, and DIY tips sent every weekday. She's among the most famous leaders in world history, yet archeologists still don't know the location of Egyptian Queen Cleopatra's tomb. Now, National Geographic Explorer and archaeologist Dr. Kathleen Martínez and her team have uncovered a major clue in their 20-year-long hunt: the remains of a port off Egypt's Mediterranean coast. The previously unknown ancient port could have been used to keep the Egyptian queen's remains out of Roman hands.


Morphological Synthesizer for Ge'ez Language: Addressing Morphological Complexity and Resource Limitations

Gebremariam, Gebrearegawi, Teklehaymanot, Hailay, Mezgebe, Gebregewergs

arXiv.org Artificial Intelligence

Ge'ez is an ancient Semitic language renowned for its unique alphabet. It serves as the script for numerous languages, including Tigrinya and Amharic, and played a pivotal role in Ethiopia's cultural and religious development during the Aksumite kingdom era. Ge'ez remains significant as a liturgical language in Ethiopia and Eritrea, with much of the national identity documentation recorded in Ge'ez. These written materials are invaluable primary sources for studying Ethiopian and Eritrean philosophy, creativity, knowledge, and civilization. Ge'ez has a complex morphological structure with rich inflectional and derivational morphology, and no usable NLP has been developed and published until now due to the scarcity of annotated linguistic data, corpora, labeled datasets, and lexicons. Therefore, we propose a rule-based Ge'ez morphological synthesizer to generate surface words from root words according to the morphological structures of the language. We used 1,102 sample verbs, representing all verb morphological structures, to test and evaluate the system. The system achieves a performance of 97.4%, outperforming the baseline model and suggesting that future work should build a comprehensive system considering morphological variations of the language. Keywords: Ge'ez, NLP, morphology, morphological synthesizer, rule-based


Perceptions of AI Across Sectors: A Comparative Review of Public Attitudes

Bialy, Filip, Elliot, Mark, Meckin, Robert

arXiv.org Artificial Intelligence

Even though current generation of AI is underpinned by a common technology - namely machine learning, especially in the form of deep learning - in the public eye it has not emerged as a single solution. Rather, it has taken shape through multiple and overlapping applications - ranging from predictive diagnostics in healthcare and algorithmic hiring systems in HR to autonomous weapons and generative language models. As AI becomes increasingly embedded in sector - specific infrastructures, the question of how publics perceive its us e is gaining urgency. Existing literature on public perception of AI suggests that attitudes are highly sensitive to the application domain . People tend to be more supportive of AI in domains where it is perceived to augment human capacity (e.g., in medical diagnostics) and more sceptical when AI is seen as replacing judg e ment or threatening civil liberties or rights (e.g., in security or surveillance). These perceptions are shaped not only by technical features of the AI system but also by institutional trust, cultural attitude s toward risk, and the moral economy of the domain in question. Despite this, few reviews have systematically compared public perceptions across sectors and explored the cross - domain patterns and differences in attitudes.


Latent Action Pretraining Through World Modeling

Tharwat, Bahey, Nasser, Yara, Abouzeid, Ali, Reid, Ian

arXiv.org Artificial Intelligence

Vision-Language-Action (VLA) models have gained popularity for learning robotic manipulation tasks that follow language instructions. State-of-the-art VLAs, such as OpenVLA and $π_{0}$, were trained on large-scale, manually labeled action datasets collected through teleoperation. More recent approaches, including LAPA and villa-X, introduce latent action representations that enable unsupervised pretraining on unlabeled datasets by modeling abstract visual changes between frames. Although these methods have shown strong results, their large model sizes make deployment in real-world settings challenging. In this work, we propose LAWM, a model-agnostic framework to pretrain imitation learning models in a self-supervised way, by learning latent action representations from unlabeled video data through world modeling. These videos can be sourced from robot recordings or videos of humans performing actions with everyday objects. Our framework is designed to be effective for transferring across tasks, environments, and embodiments. It outperforms models trained with ground-truth robotics actions and similar pretraining methods on the LIBERO benchmark and real-world setup, while being significantly more efficient and practical for real-world settings.